Non-Overlapping LZ77 Factorization and LZ78 Substring Compression Queries with Suffix Trees
نویسندگان
چکیده
We present algorithms computing the non-overlapping Lempel–Ziv-77 factorization and longest previous factor table within small space in linear or near-linear time with help of modern suffix tree representations fitting into limited space. With similar techniques, we show how to answer substring compression queries for Lempel–Ziv-78 a possible logarithmic multiplicative slowdown depending on used representation.
منابع مشابه
Substring Alignment Using Suffix Trees
Alignment of the sentences of an original text and a translation is considerably better understood than alignment of smaller units such as words and phrases. This paper makes some preliminary proposals for solving the problem of aligning substrings that should be treated as basic translation unites even though they may not begin and end at word boundaries. The proposals make crucial use of suff...
متن کاملSubstring Suffix Selection
We study the following substring suffix selection problem: given a substring of a string T of length n, compute its k-th lexicographically smallest suffix. This a natural generalization of the well-known question of computing the maximal suffix of a string, which is a basic ingredient in many other problems. We first revisit two special cases of the problem, introduced by Babenko, Kolesnichenko...
متن کاملThe Efficient Computation of Complete and Concise Substring Scales with Suffix Trees
Strings are an important part of most real application multivalued contexts. Their conceptual treatment requires the definition of substring scales, i.e., sets of relevant substrings, so as to form informative concepts. However these scales are either defined by hand, or derived in a context-unaware manner (e.g., all words occuring in string values). We present an efficient algorithm based on s...
متن کاملLZ77 Factorisation of Trees
We generalise the fundamental concept of LZ77 factorisation from strings to trees. A tree is represented as a collection of edge-disjoint fragments that either consist of one node or has already occurred earlier (in the BFS order). Similarly as for strings, such a collection uniquely determines the tree, so by minimising the number of fragments we obtain a compressed representation of the tree....
متن کاملImproving the Speed of LZ77 Compression by Hashing and Suffix Sorting
Two new algorithms for improving the speed of the LZ77 compression are proposed. One is based on a new hashing algorithm named two-level hashing that enables fast longest match searching from a sliding dictionary, and the other uses suffix sorting. The former is suitable for small dictionaries and it significantly improves the speed of gzip, which uses a naive hashing algorithm. The latter is s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Algorithms
سال: 2021
ISSN: ['1999-4893']
DOI: https://doi.org/10.3390/a14020044